14 research outputs found

    RHEA: an open-source Reproducible Hybrid-architecture flow solver Engineered for Academia

    Get PDF
    The study of complex multiscale flows (Groen et al., 2014), like for example the motion of small-scale turbulent eddies over large aerodynamic structures (Jofre & Doostan, 2022), microconfined high-pressure supercritical fluids for enhanced energy transfer (Bernades & Jofre, 2022), or hydrodynamic focusing of microorganisms in wall-bounded flows (Palacios et al., 2022), greatly benefits from the combination of interconnected theoretical, computational and experimental approaches. This manifold methodology provides a robust framework tocorroborate the phenomena observed, validate the modeling assumptions utilized, and facilitatesthe exploration of wider parameter spaces and extraction of more sophisticated insights. These analyses are typically encompassed within the field of Predictive Science & Engineering (Njam, 2009), which has attracted attention in the Fluid Mechanics community and is expected to exponentially grow as computational studies transition from (mostly) physics simulations to active vectors for scientific discovery and technological innovation with the advent of Exascale computing (Alowayyed et al., 2017). In this regard, the computational flow solver presented aims at bridging the gap between studying complex multiscale flow problems and utilizing present and future state-of-the-art supercomputing systems in academic environments.The solver presented is named RHEA, which stands for open-source Reproducible Hybrid-architecture flow solver Engineered for Academia, and is available as an open-source Git repository at https://gitlab.com/ProjectRHEA/flowsolverrheaPeer ReviewedPostprint (author's final draft

    Portable implementation model for CFD simulations. Application to hybrid CPU/GPU supercomputers

    Get PDF
    Nowadays, high performance computing (HPC) systems experience a disruptive moment with a variety of novel architectures and frameworks, without any clarity of which one is going to prevail. In this context, the portability of codes across different architectures is of major importance. This paper presents a portable implementation model based on an algebraic operational approach for direct numerical simulation (DNS) and large eddy simulation (LES) of incompressible turbulent flows using unstructured hybrid meshes. The strategy proposed consists in representing the whole time-integration algorithm using only three basic algebraic operations: sparse matrix–vector product, a linear combination of vectors and dot product. The main idea is based on decomposing the nonlinear operators into a concatenation of two SpMV operations. This provides high modularity and portability. An exhaustive analysis of the proposed implementation for hybrid CPU/GPU supercomputers has been conducted with tests using up to 128 GPUs. The main objective consists in understanding the challenges of implementing CFD codes on new architectures.Peer ReviewedPostprint (author's final draft

    Efficient CFD code implementation for the ARM-based Mont-Blanc architecture

    Get PDF
    Since 2011, the European project Mont-Blanc has been focused on enabling ARM-based technology for HPC, developing both hardware platforms and system software. The latest Mont-Blanc prototypes use system-on-chip (SoC) devices that combine a CPU and a GPU sharing a common main memory. Specific developments of parallel computing software and well-suited implementation approaches are of crucial importance for such a heterogeneous architecture in order to efficiently exploit its potential. This paper is devoted to the optimizations carried out in the TermoFluids CFD code to efficiently run it on the Mont-Blanc system. The underlying numerical method is based on an unstructured finite-volume discretization of the Navier–Stokes equations for the numerical simulation of incompressible turbulent flows. It is implemented using a portable and modular operational approach based on a minimal set of linear algebra operations. An architecture-specific heterogeneous multilevel MPI+OpenMP+OpenCL implementation of such kernels is proposed. It includes optimizations of the storage formats, dynamic load balancing between the CPU and GPU devices and hiding of communication overheads by overlapping computations and data transfers. A detailed performance study shows time reductions of up to on the kernels’ execution with the new heterogeneous implementation, its scalability on up to 128 Mont-Blanc nodes and the energy savings (around ) achieved with the Mont-Blanc system versus the high-end hybrid supercomputer MinoTauro.The research leading to these results has received funding from the European Community’s Seventh Framework Programme [FP7/2007–2013] and Horizon 2020 under the Mont-Blanc Project (www.montblanc-project.eu), grant agreement n 288777, 610402 and 671697. The work has been financially supported by the Ministerio de Ciencia e Innovación, Spain (ENE- 2014-60577-R), the Russian Science Foundation, project 15-11-30039, CONICYT Becas Chile Doctorado 2012, the Juan de la Cierva posdoctoral grant (IJCI-2014-21034), and the Initial Training Network SEDITRANS (GA number: 607394), implemented within the 7th Framework Programme of the European Commission under call FP7-PEOPLE- 2013-ITN. Our calculations have been performed on the resources of the Barcelona Supercomputing Center. The authors thankfully acknowledge these institutions.Peer ReviewedPostprint (published version

    Optimising the Termofluids CFD code for petascale simulations

    Get PDF
    This paper presents some recent efforts carried out on the expansion of the scalability of TermoFluids multi-physics Computational Fluid Dynamics (CFD) code, aiming to achieve petascale capacity for a single simulation. We describe different aspects that we have improved in our code in order to efficiently run it on 131,072 CPU-cores. This work has been developed using the BlueGene/Q Mira supercomputer of the Argonne Leadership Computing Facility, where we have obtained feedback at the targeted scale. In summary, this is a practical paper showing our experience at reaching the petascale paradigm for a single simulation with TermoFluids.Peer ReviewedPostprint (author's final draft

    Parallel SFC-based mesh partitioning and load balancing

    Get PDF
    Modern supercomputers allow the simulation of complex phenomena with increased accuracy. Eventually, this requires finer geometric discretizations with larger numbers of mesh elements. In this context, and extrapolating to the Exascale paradigm, meshing operations such as generation, adaptation or partition, become a critical issue within the simulation workflow. In this paper, we focus on mesh partitioning. In particular, we present some improvements carried out on an in-house parallel mesh partitioner based on the Hilbert Space-Filling Curve. Additionally, taking advantage of its performance, we present the application of the SFC-based partitioning for dynamic load balancing. This method is based on the direct monitoring of the imbalance at runtime and the subsequent re-partitioning of the mesh. The target weights for the optimized partitions are evaluated using a least-squares approximation considering all measurements from previous iterations. In this way, the final partition corresponds to the average performance of the computing devices engaged.This work is partially supported by the BSC-IBM Deep Learning Research Agreement, under JSA “Application porting, analysis and optimization for POWER and POWERAI”. It has also been partially supported by the EXCELLERAT project funded by the European Commission’s ICT activity of the H2020 Programme under grant agreement number: 823691. It has also received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement number: 846139 (Exa-FireFlows). This paper expresses the opinions of the authors and not necessarily those of the European Commission. The European Commission is not liable for any use that may be made of the information contained in this paper. This work has also been financially supported by the Ministerio de Economia, Industria y Competitividad, of Spain (TRA2017-88508-R). The computing experiments of this paper have been performed on the resources of the Barcelona Supercomputing Center.Peer ReviewedPostprint (author's final draft

    A hybrid parallel numerical model for wave-induced free-surface flow

    Get PDF
    An advanced numerical model is presented for the simulation of wave-induced free-surface flow, utilizing an efficient hybrid parallel implementation. The model is based on the solution of the Navier–Stokes equations using large-eddy simulation of large-scale coastal free-surface flows. The three-dimensional immersed boundary method was used for the enforcement of the no-slip boundary condition on the bed surface. The water-air interface was tracked using the level-set method. The numerical model was effectively validated against laboratory measurements involving wave propagation over a flatbed with an elliptical shoal, whose presence induces combined wave refraction and diffraction phenomena. The parallel implementation of the model enabled the efficient simulation of depth-resolved, wave-induced, three-dimensional, free-surface flow; the model parallel efficiency and strong scaling are quantitatively demonstrated.The present research has been co-financed by Greece and the European Union (European Social Fund—ESF) through the Operational Program “Human Resources Development, Education and Lifelong Learning”, in the framework of the “Supporting Postdoctoral Researchers-B cycle” (MIS 5033021) implemented by the State Scholarships Foundation (IKY).Peer ReviewedPostprint (published version

    Portable implementation model for CFD simulations. Application to hybrid CPU/GPU supercomputers

    No full text
    Nowadays, high performance computing (HPC) systems experience a disruptive moment with a variety of novel architectures and frameworks, without any clarity of which one is going to prevail. In this context, the portability of codes across different architectures is of major importance. This paper presents a portable implementation model based on an algebraic operational approach for direct numerical simulation (DNS) and large eddy simulation (LES) of incompressible turbulent flows using unstructured hybrid meshes. The strategy proposed consists in representing the whole time-integration algorithm using only three basic algebraic operations: sparse matrix–vector product, a linear combination of vectors and dot product. The main idea is based on decomposing the nonlinear operators into a concatenation of two SpMV operations. This provides high modularity and portability. An exhaustive analysis of the proposed implementation for hybrid CPU/GPU supercomputers has been conducted with tests using up to 128 GPUs. The main objective consists in understanding the challenges of implementing CFD codes on new architectures.Peer Reviewe

    Memory aware poisson solver for peta-scale simulations with one FFT diagonalizable direction

    Get PDF
    Problems with some sort of divergence constraint are found in many disciplines: computational fluid dynamics, linear elasticity and electrostatics are examples thereof. Such a constraint leads to a Poisson equation which usually is one of the most computationally intensive parts of scientific simulation codes. In this work, we present a memory aware Poisson solver for problems with one Fourier diagonalizable direction. This diagonalization decomposes the original 3D system into a set of independent 2D subsystems. The proposed algorithm focuses on optimizing the memory allocations and transactions by taking into account redundancies on such 2D subsystems. Moreover, we also take advantage of the uniformity of the solver through the periodic direction for its vectorization. Additionally, our novel approach automatically optimizes the choice of the preconditioner used for the solution of each frequency subsystem and dynamically balances its parallel distribution. Altogether constitutes a highly efficient and robust HPC Poisson solver that has been successfully attested up to 16384 CPU-cores.Peer Reviewe

    Memory aware poisson solver for peta-scale simulations with one FFT diagonalizable direction

    No full text
    Problems with some sort of divergence constraint are found in many disciplines: computational fluid dynamics, linear elasticity and electrostatics are examples thereof. Such a constraint leads to a Poisson equation which usually is one of the most computationally intensive parts of scientific simulation codes. In this work, we present a memory aware Poisson solver for problems with one Fourier diagonalizable direction. This diagonalization decomposes the original 3D system into a set of independent 2D subsystems. The proposed algorithm focuses on optimizing the memory allocations and transactions by taking into account redundancies on such 2D subsystems. Moreover, we also take advantage of the uniformity of the solver through the periodic direction for its vectorization. Additionally, our novel approach automatically optimizes the choice of the preconditioner used for the solution of each frequency subsystem and dynamically balances its parallel distribution. Altogether constitutes a highly efficient and robust HPC Poisson solver that has been successfully attested up to 16384 CPU-cores.Peer Reviewe

    HPC² - A fully-portable, algebra-based framework for heterogeneous computing. Application to CFD

    Get PDF
    The variety of computing architectures competing in the exascale race makes the portability of codes of major importance. In this work, the HPC2 code is presented as a fully-portable, algebra-based framework suitable for heterogeneous computing. In its application to CFD, the algorithm of the time-integration phase relies on a reduced set of only three algebraic operations: the sparse matrix-vector product, the linear combination of vectors and the dot product. This algebraic approach combined with a multilevel MPI+OpenMP+OpenCL parallelization naturally provides portability. The performance has been studied on different architectures including multicore CPUs, Intel Xeon Phi accelerators and GPUs of AMD and NVIDIA. The multi-GPU scalability is demonstrated up to 256 devices. The heterogeneous execution is tested on a CPU+GPU hybrid cluster. Finally, results of the direct numerical simulation of a turbulent flow in a 3D air-filled differentially heated cavity are presented to show the capabilities of the HPC2 dealing with large-scale CFD simulations.Peer Reviewe
    corecore